Goto

Collaborating Authors

 energy function


Detecting Metastable Basins in High Dimensions via Marginal Trajectory Distribution Discrimination

arXiv.org Machine Learning

We study the problem of identifying dynamically distinct basins of attraction in high dimensional time-homogeneous Markov processes using only trajectory sampling. This problem is fundamental in the analysis of metastable dynamical systems, where the process rapidly mixes within basins while transitions between basins occur rarely on the timescale of interest, or even when the state space is reducible. Existing approaches typically rely on spatial discretization or spectral analysis of estimated transition operators, which can become unreliable in high dimensional settings or when the underlying basin geometry is highly nonlinear. We propose a discriminative approach to basin identification based on marginal trajectory distribution comparison. We prove a simple risk separation result: if two initial states belong to the same basin, the Bayes-optimal classifier distinguishing their marginal trajectory distributions achieves risk close to 1/2, whereas if they lie in distinct basins, the optimal risk is close to zero. This observation reduces basin detection to a two-sample discrimination problem between marginal trajectory distributions. Motivated by this principle, we develop a neural algorithm that receives a set of candidate basin representatives and iteratively merges them by estimating classification risk with a neural network that approximates the Bayes classifier. We evaluate the method on various metastable systems. These include synthetic systems constructed by embedding low-dimensional dynamics into high dimensional noisy ambient spaces. In these settings, standard spectral and clustering-based methods often fail, while our approach accurately recovers the underlying basin structure. These results display a shortcoming of existing methods and highlight trajectory discrimination as an effective tool for identifying dynamical basins in high dimensional stochastic systems.


Revisiting Transformer Layer Parameterization Through Causal Energy Minimization

arXiv.org Machine Learning

Transformer blocks typically combine multi-head attention (MHA) for token mixing with gated MLPs for token-wise feature transformation, yet many choices in their parameterization remain largely empirical. We introduce Causal Energy Minimization (CEM), a framework that recasts Transformer layers as optimization steps on conditional energy functions while explicitly accounting for layer parameterization. Extending prior energy-based interpretations of attention, CEM shows that weight-tied MHA can be derived as a gradient update on an interaction energy, and that a gated MLP with shared up/down projections can be viewed through an element-wise energy. This perspective identifies a design space for Transformer layers that includes within-layer weight sharing, diagonal-plus-low-rank interactions, lightweight preconditioners, and recursive updates. We evaluate CEM-derived layers in language-modeling experiments at the moderate hundred-million-parameter scale. Despite their constrained parameterizations, these layers train stably and can match corresponding Transformer baselines. Overall, our results suggest that CEM provides a useful lens for understanding Transformer layer parameterization, connecting Transformer architectures to energy-based models and motivating further exploration of energy-guided layer designs.


DISCS: ABenchmark for Discrete Sampling

Neural Information Processing Systems

Sampling in discrete spaces, with critical applications in simulation and opti-1 mization, has recently been boosted by significant advances in gradient-based2 approaches that exploit modern accelerators like GPUs. However, two key chal-3 lenges hinder the further research progress in discrete sampling. First, since there4 is no consensus on experimental settings, the empirical results in different research5 papers are often not comparable. Secondly, implementing samplers and target6 distributions often requires a nontrivial amount of effort in terms of calibration,7 parallelism, and evaluation. To tackle these challenges, we propose DISCS (DIS-8 Crete Sampling), a tailored package and benchmark that supports unified and9 efficient implementation and evaluations for discrete sampling in three types of10 tasks: sampling for classical graphical models, combinatorial optimization, and11 energy based generative models. Throughout the comprehensive evaluations in12 DISCS, we acquired new insights into scalability, design principles for proposal13 distributions, and lessons for adaptive sampling design.




Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler's Rotation Equation

Neural Information Processing Systems

Protein-ligand binding prediction is a fundamental problem in AI-driven drug discovery. Previous work focused on supervised learning methods for small molecules where binding affinity data is abundant, but it is hard to apply the same strategy to other ligand classes like antibodies where labelled data is limited. In this paper, we explore unsupervised approaches and reformulate binding energy prediction as a generative modeling task. Specifically, we train an energy-based model on a set of unlabelled protein-ligand complexes using SE(3) denoising score matching (DSM) and interpret its log-likelihood as binding affinity. Our key contribution is a new equivariant rotation prediction network for SE(3) DSM called Neural Euler's Rotation Equations (NERE). It predicts a rotation by modeling the force and torque between protein and ligand atoms, where the force is defined as the gradient of an energy function with respect to atom coordinates. Using two protein-ligand and antibody-antigen binding affinity prediction benchmarks, we show that NERE outperforms all unsupervised baselines (physics-based potentials and protein language models) in both cases and surpasses supervised baselines in the antibody case.



Towards understanding retrosynthesis by energy-based models

Neural Information Processing Systems

Retrosynthesis is the process of identifying a set of reactants to synthesize a target molecule. It is critical to material design and drug discovery. Existing machine learning approaches based on language models and graph neural networks have achie rarely ved discussed, encouraging and rigorous results. Ho evaluations wever, the of inner these connections models are of lar these gely in models need.